Method and server cluster for map reducing flow services and large documents

ABSTRACT

The present invention refers to a method for MapReducing the processing of an Electronic Data Interchange (EDI) document ( 1,  the method comprising the following steps:
         a. mapping the EDI document ( 1 ) into a plurality of intermediate documents ( 10, 11 );   b. processing the intermediate documents ( 10, 11 ) to produce a plurality of intermediate results ( 20 - 23 );   c. reducing the plurality of intermediate results ( 20 - 23 ) to produce a plurality of reduced intermediate results ( 30, 31 ); and   d. reducing the reduced intermediate results ( 30, 31 ) to produce a final result ( 2 ) representing the processed EDI document ( 1 ).

1. TECHNICAL FIELD

The present invention relates to a method, a server cluster and acomputer program for MapReducing the processing of large documents, forexample Electronic Data Interchange (EDI) documents.

2. THE PRIOR ART

Modern software applications in an enterprise environment are typicallystructured into sub-programs each performing certain subtasks of thesoftware application. Typically, huge amounts of data have to beprocessed by such applications, for example in the field ofcommunication between applications of different enterprises, whereinlarge documents have to be sent and processed.

Such applications are often executed on integration servers, an exampleof which is the webMethods Integration Server of applicant. TheIntegration Server supports a graphical programming model FLOW, which isused for defining the processing logic of an Integration Server. FLOWallows for the graphical definition of a plurality of FLOW services as“black box” services as well as pipelines between the FLOW services,which serve to pass data from outputs of one FLOW service to inputs ofanother FLOW service. Since FLOW is a graphical programming language, italleviates the developer from writing complex and error-proneconventional code. FLOW services may be used for processing any kind ofinformation and for performing various kinds of computations.

A common approach known from the prior art for processing largedocuments by an Integration Server is to process the contents of thedocument in a sequential manner. However, since the size of thedocuments may be in the range of Gigabytes, such a sequential processingis very time-consuming and processing-intensive and may require specialhigh-end hardware, whose maintenance is costly and complex.

Another approach known from the prior art is to employ a broker, whichdistributes the large documents to instances of an Integration Server inorder to achieve some parallel processing. However, this approachrequires additional and often complex messaging middleware for thecommunication between the broker and the Integration Server instances,which typically imposes high network bandwidth requirements and resultsin a high consumption of resources. Furthermore, this approach typicallyinvolves processing multiple large documents by the broker and theIntegration Server instances, wherein each large document is stillprocessed by a single Integration Servers in a sequential manner.

Furthermore, in the field of processing large input sets of data, aprogramming model and associated framework called MapReduce is knownfrom the document “MapReduce: Simplified Data Processing on LargeClusters” by J. Dean et al. of Google, Inc. (OSDI'04: Sixth Symposium onOperating System Design and Implementation, San Francisco, December,2004). A user-defined map function takes an input pair and produces aset of intermediate key/value pairs. The MapReduce library groupstogether all intermediate values associated with the same intermediatekey and passes them to a user-defined reduce function. The reducefunction accepts an intermediate key and a set of values. It mergestogether the values to form a possibly smaller set of values. Typicallyzero or one output value is produced per reduce invocation. Theintermediate values are supplied to the user's reduce function via aniterator. This allows for handling lists of values that are too large tofit in memory. Programs written in this programming model may beautomatically executed in parallel on different machines by theframework. However, employing the MapReduce programming model onto anexisting application requires an in-depth adaptation of the programminglogic of the application to conform to the MapReduce programming model.Furthermore, MapReduce is intended for the field of search engines,where specialized tasks such as counting words in huge collections ofdocuments, building graph structures of web links and the like arecommon.

One concrete example of the processing of large documents is ElectronicData Interchange (EDI). EDI relates to the transmission of structuredmessages between applications by electronic means. EDI is typically usedto transmit large documents such as invoices or purchase orders betweenapplications of different enterprises. A number of standardized formatsfor the structured messages are known in the art, e.g. ANSI X12, UCS,VICS, UN/EDIFACT, ODETTE and EANCOM. Processing such large EDI documentstypically involves the above-mentioned disadvantages.

The technical problem underlying the present invention is therefore inone aspect to provide a method and a system for processing largedocuments, in particular EDI documents, with less processing time andcomputing effort and thereby at least partly overcoming the aboveexplained disadvantages of the prior art. Another but related technicalproblem underlying the present invention is to provide a method and asystem for processing the input of a FLOW service with less processingtime and computing effort, which is furthermore flexibly adaptable toexisting programming logic with minimal adaptation efforts.

3. SUMMARY OF THE INVENTION

According to one aspect, the invention relates to a method forMapReducing the processing of an Electronic Data Interchange (EDI)document. In the embodiment of claim 1, the method comprises the stepsof:

-   -   a. mapping the EDI document into a plurality of intermediate        documents;    -   b. processing the intermediate documents to produce a plurality        of intermediate results;    -   c. reducing the plurality of intermediate results to produce a        plurality of reduced intermediate results; and    -   d. reducing the reduced intermediate results to produce a final        result representing the processed EDI document.

The first aspect of the present invention is based on the realisationthat the concept of MapReducing cannot only be used in the context ofSearch Engines but also advantageously for the processing of EDIdocuments in an enterprise environment. Accordingly, a large EDIdocument is at first mapped, i.e. split, into multiple intermediatedocuments. The mapping, i.e. splitting is preferably performed such thateach resulting intermediate document has an approximately equally sizedpayload, i.e. so that it consumes a comparable amount of processing timeand/or processing resources, when being processed in the further stepsof the method.

The intermediate documents are then processed to produce a plurality ofintermediate results, which is preferably performed in parallel toimprove the processing performance in terms of overall processing time.Furthermore, since the EDI document is mapped to a plurality of,typically smaller, intermediate documents, the intermediate documentsmay be processed by commodity hardware, i.e. there is no need to employspecialized high-end hardware.

After the processing of the intermediate documents, the resultingintermediate results are reduced to produce a plurality of reducedintermediate results. Reducing means collating the related intermediateresults into one reduced intermediate result. Related in this contextmeans that two or more intermediate results stem from the same originalEDI document.

Finally, the reduced intermediate results are reduced in a further stepto produce a final result. This method step typically involvesadequately combining the reduced intermediate results, in order toobtain the final result of the processing of the EDI document.

The described two-step reducing is especially advantageous, if thereducing steps are performed on different physical machines in order toachieve a parallelization. In this case, since the intermediate resultsare already reduced before being sent to another machine, which performsthe second reducing step, valuable network bandwidth can be saved, sinceless results have to be transmitted between the machines. Furthermore,this aspect is especially advantageous if the reducing steps arecommutative (i.e. A operation B is equivalent to B operation A) and/orassociative (i.e. A operation (B operation C) is equivalent to (Aoperation B) operation C). Consequently, the reducing steps may beperformed in parallel in any order. Another advantage associated with atwo-step reducing is that the load, i.e. processing time for performingreduce step, may be shared between commodity machines, rather than onemachine doing the reduce step over a large set. The second reduce stepmay thus be performed over a smaller set of intermediate results.

In another aspect of the present invention, the EDI document (1) may bemapped such that each of the intermediate documents (10, 11) comprisesat least one of a plurality of interchange envelopes, at least one of aplurality of functional group envelopes and/or at least one of aplurality of transaction set envelopes of the EDI document (1).Accordingly, the mapping, i.e. splitting, may be performed at one of theboundaries defined by the structure of the EDI document, i.e. ontransactional set envelope level, functional group envelope level and/orinterchange envelope level. It typically depends upon the end user todefine at what boundary the EDI document is to be split based on thestructure of the EDI document. For example, if the functional groupsand/or interchange envelopes contain the optimum number of transactionsto define a reasonably sized payload.

Steps a. and d., i.e. the mapping and the final reducing, may beperformed by a master server of a server cluster and steps b. and c.,i.e. processing and reducing the intermediate documents or intermediateresults, respectively, may be performed by a plurality of slave serversof the server cluster. Each slave server may process one or moreintermediate documents and reduce one or more intermediate results.Performing the processing by a plurality of slave servers is especiallyadvantageous, since the processing of the intermediate documents can behighly parallelized. The master and slave servers are preferablydistinct physical machines communicating over a network connection.

For example if the processing task is to add a list of numbers {1, 2, 3,4, 5, 6, 7, 8, 9, 10,11, 12}, the master server may delegate theintermediate documents {1, 2} to a slave node 1, {3, 4} to a slave node2, {5, 6} to a slave node 3, {7, 8} to slave node 1, {9, 10} to theslave node 2 and {11, 12} to the slave node 3. At the slave node 1, theintermediate results would then be sum of the intermediate documents: 3corresponding to {1, 2} and 15 corresponding to {7, 8}. The reduce stepon the slave node 1 would then add 3 and 15, resulting to 18.Accordingly, the reduce step on the slave node 2 would add 7 and 19 toyield 26 and the slave node 3 would add 11 and 23 into 34. Consequently,only three reduced intermediate results, 18, 26, 34, would have to betransferred back to the master server, instead of transferring 3, 15, 7,19, 11, 23. The final reduce step performed on the master server wouldthen yield 78 (18+26+34), which is the desired result.

In another aspect, the method may further comprise the step of sendingthe intermediate documents to the slave servers by an asynchronousinvocation from the master server. Accordingly, the master server takesthe large EDI document and delegates the processing of the intermediatedocuments to the slave servers. The EDI document itself preferably stayswith the master server. Asynchronous invocation means that once themaster server invokes, i.e. triggers, the processing of a slave server,which is preferably performed by a thread pool of the master server, themaster server threads do not wait for the slave servers to finish theirprocessing (which would be a synchronous invocation), but that themaster server may immediately proceed with its own processing, i.e.subsequently invoking further slave servers. This concept even moreincreases the processing speed of the present invention, since there areno master server resources which are blocked (i.e. waiting for the slaveservers), thus resulting in a faster delegation of tasks to the slaveservers.

Alternatively, the EDI document may be stored in a distributed filesystem accessible to the slave servers and the method may comprise thefurther step of sending, by the master server, a reference to theintermediate documents to the slave servers by an asynchronousinvocation. If the slave servers are connected to the distributed filesystem over direct connections, this aspect may speed up the processingconsiderably, since it is not necessary to send the EDI document or theintermediate documents over a slow network connection. In this case,only a reference to the EDI document and/or the portions of the EDIdocument which is supposed to be processed by the slave (i.e. theintermediate documents) have to be passed to the slave nodes.Co-location (i.e. providing a reference to the portions which actuallyreside on the slave nodes itself) is especially advantageous since itsaves a lot of bandwidth consumption, since no EDI data transfer happensbetween the machines.

When processing the intermediate documents, the slave servers preferablystore the intermediate results locally, either in memory or in apersistent file system, which are then collected by the master server.

Furthermore, each of the intermediate results may comprise an identifierrelating the respective intermediate result to the EDI document. Each ofthese intermediate invocation results may be tracked to the originalinvocation by the use of an identifier. The identifier may e.g. be acounter which is increased with every original invocation with a largeEDI document. The identifier may be used to allow for asynchronousbehaviour when the master server calls the slave servers. This aspectmay free the delegating threads at the master server (which insynchronous mode would have waited for the slave servers to performtheir processing), thus leading to a better resource utilization at themaster server and indirectly leading to more parallelization. When themaster server delegates the intermediate results to the slave servers inan asynchronous manner, the slave servers thus have a means to tracktheir obtained intermediate results back to the original invocation fromthe master server. For example, if there is a large EDI document to beprocessed, an identifier “12345” may be created for the invocation. Themethod may pass this identifier to the slave servers, while delegatingthe intermediate documents to the slave servers. This helps in relatingall the intermediate results to the original EDI document in thesubsequent reduce steps, as at the slave servers the intermediateresults may be maintained with this identifier.

Additionally or alternatively, a processing logic adapted for performingthe processing of the slave servers in step b. may be distributed to theslave servers during runtime. Accordingly, the slave servers do not haveto have copies of the executables, i.e. the processing logic, whichexecute the EDI document. The executables may for example be comprisedin a library of the master server and spread to the slave servers atruntime. The spreading is preferably performed depending on theexecutables needed for the current EDI document. Any peer to peerframework or proprietary mechanism may be used to share the executables.

Furthermore, the present invention relates to a server clustercomprising a master server and a plurality of slave servers adapted forperforming any of the methods presented above.

In yet another aspect of the present invention, a method for MapReducingthe processing of at least one input of a FLOW service is provided. Inthe embodiment of claim 9, the method comprises the steps of:

-   -   a. mapping the at least one input of the FLOW service into a        plurality of intermediate inputs by a mapper service;    -   b. executing a plurality of instances of the FLOW service, the        instances of the FLOW service processing the intermediate inputs        to produce a plurality of intermediate results;    -   c. reducing the intermediate results into a plurality of reduced        intermediate results by a plurality of first reducer services;        and    -   d. reducing the reduced intermediate results to produce a final        output of the FLOW service from the reduced intermediate results        by a second reducer service.

Accordingly, a FLOW service, either a newly created or an existing FLOWservice, does not process its inputs in a sequential manner, but theprocessing of the FLOW service is effectively “parallelized” by theabove method. Therefore, the inputs of the FLOW service are not directlyfed into the FLOW service as commonly performed, but are first split bya mapper service into a plurality of intermediate inputs. In anembodiment the FLOW service itself is “cloned”, i.e. the intermediateinputs are processed by a plurality of instances of the FLOW service,preferably in parallel. The resulting intermediate results are thenreduced by a plurality of first reducer services in order to obtain onereduced intermediate result for each instance of the FLOW service.Finally, the reduced intermediate results are reduced by a secondreducer service in order to provide the final output of the FLOWservice. Preferably, the second reducer service is based on the sameimplementation than the first plurality of reducer services, i.e. allreducing steps are performed by instances of the same reducer service.In the following, the terms “reducer service” and “instance of thereducer service” are used synonymously for the sake of clarity. It is tobe noted that the overall input and output of the FLOW service stays thesame, only the processing is parallelized.

In one aspect, the mapper service and the second reducer service areexecuted on a master server of a server cluster and wherein theplurality of instances of the FLOW service and the plurality of firstreducer services are executed on a plurality of slave servers of theserver cluster.

In another aspect, an input signature of the mapper service conforms toan input signature of the FLOW service. Additionally or alternatively,an output signature of the reducer service conforms to an outputsignature of the FLOW service. An input signature (or an outputsignature) preferably defines the number and type of arguments providedas input (or as output) of a service, hence defining the interface ofthe service.

Due to the fact that the input signature of the mapper servicepreferably conforms to the input signature of the FLOW service to beparallelized, any existing FLOW service may be connected to a mapperservice with the same input signature. Furthermore, any existing FLOWservice may be connected to a reducer service with a conforming outputsignature, which means that any existing FLOW service may be embedded inthe present method without the need to adapt its input or outputsignature or internal processing logic. This is especially advantageous,since it highly increases the flexibility and applicability of thepresent invention. An example of input and output signature is presentedin the detailed description below.

In yet another aspect, at least one input of the FLOW service maycomprise an Electronic Data Interchange (EDI) document. Hence, the FLOWservice is in this aspect preferably adapted for processing the EDIdocument. When the FLOW service is parallelized, an especially efficientprocessing of the EDI document may be achieved similar to the aspectspresented further above. However, it should be appreciated that FLOWservices are not at all restricted to processing EDI documents. On thecontrary, FLOW services are suitable for processing any kind ofdocuments, such as XML documents for example. Furthermore, not onlydocuments may be processed by FLOW services, but any kind of computationlogic may be implemented.

The present invention also relates to a server cluster comprising amaster server and a plurality of slave servers adapted for performingany of the above presented methods.

Lastly, a computer program is provided comprising instructions adaptedfor implementing any of the above described methods.

4. SHORT DESCRIPTION OF THE DRAWINGS

In the following detailed description, presently preferred embodimentsof the invention are further described with reference to the followingfigures:

FIG. 1: A schematic overview of an embodiment of the present invention;

FIG. 2: A schematic overview of a master server and a plurality of slaveservers according to an embodiment of the present invention;

FIG. 3: An overview of the structure of an EDI document;

FIG. 4: An exemplary FLOW service and its related input and outputs;

FIG. 5: Another exemplary FLOW service for processing an EDI document;

FIG. 6: An overview of a graphical user interface for specifying theproperties of a MapReduced FLOW service; and

FIG. 7: A class diagram of an exemplary implementation of a method ofthe present invention.

5. DETAILED DESCRIPTION

In the following, a presently preferred embodiment of the invention isdescribed with respect to the processing of a large EDI document by aserver cluster according to the present invention. A server cluster,also referred to as a grid as schematically shown in FIG. 2, is adistributed computing platform which allows for parallel processing. Itis typically composed of a cluster of networked, loosely coupledcomputers acting in concert to perform very large computing or dataintensive tasks. It should be appreciated that processing an EDIdocument is only one of a wide variety of scenarios for the presentinvention and that any other types of documents may be processed.Furthermore, not only document processing may be advantageously achievedby the present invention, but any kind of complex computations, as willbe demonstrated in further exemplary embodiments below.

The general structure of an EDI document is schematically depicted inFIG. 3, which shows the structure as defined for example by the ANSI ASCX12 standard. Accordingly, an EDI document comprises any number oftransactions, which are grouped by various envelopes. On the innermostlevel, a transaction set is identified by the ST/SE segments shown inFIG. 3. The ST segment preferably comprises a transaction set ID, acontrol number and an optional implementation convention reference. TheSE segment preferably comprises the number of included segments in thetransaction set and the same control number as the ST segment. Thesecond level of enveloping is the functional group envelope. Its purposeis to group similar types of transaction sets within a transmission.ANSI ASC X12 defines a number of business processes for grouping similartransaction sets, like Planning Schedule (830), Purchase Order (850),Purchase Order Acknowledgment (855), Purchase Order Change (865), OrderStatus Inquiry (869) or Order Status Report (870).

The outermost level is the interchange envelope that is defined by ISAand IEA segments (see FIG. 3). An Interchange envelope preferablyencloses the data from one sender to one receiver. The ISA segment ispreferably a fixed length segment. Some items contained in the ISA/IEAsegments are structured mailbox addresses of the sender and receiver,interchange control numbers, counts of the functional groups within theinterchange envelope, time/date stamps and the version of theinterchange envelope.

Typical ways to process such an EDI document might be to map the data ofthe EDI document to another format (e.g., the format that a back-endsystem requires) or to map data from the EDI document to the inputs of aFLOW service, as further outlined below.

Traditional EDI processing typically processes one transaction at atime. If the EDI document size is in the order of hundreds of megabytesor gigabytes, this processing is very time consuming. To somewhatalleviate this disadvantage, typically a cluster of high-end servers aredeployed to process each of a plurality of EDI documents in parallel.The employment of high-end servers, however, has severe disadvantages,e.g. an increased complexity if hardware/software fails during theprocessing and an increased cost of ownership for maintaining thehigh-end servers.

The present invention defines a method and server cluster forparallelizing the processing of on EDI document-level. As can be seen inFIG. 1, a master server M1 at first receives a large EDI document 1. TheEDI document is at first mapped, i.e. split, on the interchange envelopeboundaries into a plurality of intermediate documents 10, 11. However,it should be appreciated that an EDI document may in further embodimentsof the present application as well be split for example at thefunctional group envelope level or even at the transaction set envelopelevel, depending on the type of EDI document.

Even more fine-grained approaches are suitable with the presentinvention, for example splitting the EDI document at the singletransaction level, if the transactions in the EDI document areindependent entities. As a result the document could be mapped (chunked)into very small portions leading to a high level of parallelization.

After splitting the EDI document, the master server M1 delegates theintermediate documents 10, 11 to a plurality of slave servers S1, S2 forprocessing. The slave servers S1, S2 process the intermediate documents10, 11 and produce intermediate results 20-23. It should be noted thateach processing of one intermediate document may result in multipleintermediate results, as further explained below.

The intermediate results 20-23 are then reduced by each of the slaveservers S1, S2 in order to preferably obtain one reduced intermediateresult 30, 31 per slave server S1, S2.

When the master server M1 has finished delegating the intermediatedocuments to the slave servers S1, S2, it preferably issues reduce callson each of the slave servers S1, S2. The delegation is preferablyinvoked in an asynchronous manner, so that the master server M1, i.e.its threads, may proceed with its processing and does not have to waitfor each slave server S1, S2 to finish execution, as already explainedabove.

The reduce calls issued by the master server M1 trigger the slaveservers S1, S2 to send their respective reduced intermediate results 30,31 back to the master server M1. The master server M1 then issuesanother reduce call for collating the collected reduced intermediateresults 30, 31 into one final output 2. The output 2 then represents theprocessed EDI document 1.

It is to be noted that, since the slave servers S1, S2 each process onlya portion of the overall EDI document 1, there is no need forspecialized high-end hardware.

Commodity machines may be used as slave servers, which greatly reducesthe cost of the overall architecture.

The processing of the master and slave servers is preferably performedby a number of services. Particularly preferred is an embodiment wherethe servers are webMethods Integration Serves. The webMethodsIntegration Server is at the core of the webMethods portfolio ofproducts of Applicant. It is a Java based, multi-platform enterpriseIntegration engine supporting the execution of services to performintegration logic such as data mapping and communication with othersystems. The Integration Server provides a graphical programming modelFLOW that is used for performing common integration tasks such asmapping, invoking other services, looping and branching. Some of theIntegration Server features include writing graphical FLOW and javaservices, defining and modifying documents and mapping logic, testing,debugging and executing services, creating and configuring web servicesand editing adapter services and notifications.

FIG. 4 depicts an exemplary simple FLOW service “sampleFlowService”which takes two integers “input1” and “input2” and provides two outputs“multiplyIntsResult” (the result of a multiplication of the two inputintegers) and “addIntsResult” (the result of an addition of the twoinput integers). When executing the exemplary FLOW service on theIntegration Server, the user may be provided with a dialog to entervalues for the inputs and another dialog may be presented whichcomprises the computation results. FIG. 4 shows a graphical userinterface preferably used by the developer for specifying the mappingbetween the inputs, the FLOW service and the outputs.

Another example of FLOW service processing is to count the occurrencesof words in a file. A common approach without parallelization would beto read the file line by line, to add the word as a key in a HashMap andthe count as a value in the HashMap. First the HashMap is queried forthe key and if the query returns “null”, the count is put as 1.Otherwise the original count is retrieved and it will be incremented andput back in to the HashMap. When the mapper M10 and reducer servicesR10, R11, R20 are written, the mapper service may produce smaller filesas the output and the reducer services only combine the output HashMapwith a final HashMap. Accordingly, the input/output signature of theoriginal FLOW service which does the word count remains the same andonly the logic of the mapper and the reducer operation have to bewritten. This is an especially advantageous aspect of the presentinvention, as further explained below.

Yet another example of a FLOW service is the processing of an EDIdocument. FIG. 5 shows an exemplary FLOW service “mainService”, whichtakes an EDI file name as input. It converts the EDI file format to aninternal webMethods format by calling the service “ediToValues” alsodepicted in FIG. 5. As an output, it returns if the input EDI file isvalid as a whole after the conversion. It may further indicate theconsumed processing time for execution of the service (not shown in FIG.5). The input/output signature of the FLOW service “ediToValues” isstructured as follows: It accepts either an input “ediFileName” (thefile name of the EDI document) or an input “edidata” (the actual EDIdata itself represented as string) which are mutually exclusive. If a“printTime” input is set, the time taken to execute the service will beprinted out. A single output “isValid” will be output indicating if theEDI document is valid or not after the conversion.

Since processing the above described FLOW service “ediToValues”sequentially consumes a great amount of processing time and resources,it is demonstrated in the following how the method of the presentinvention is applied onto this existing FLOW service in order toefficiently and flexibly “parallelize” it.

Referring to FIG. 6, in the properties panel of the FLOW service“ediToValues”, the developer may provide the following properties:

-   -   Mapper service: a valid Integration Server service for mapping        the input data    -   Reducer service: a valid Integration Server service for reducing        the output data    -   Grid enabled: set to “true”    -   Throttle: the maximum number of parallel executions including        the master and the slave servers    -   Policy: this property specifies whether to hold the intermediate        results of the slave servers in memory (if they are of        negligible size), or persist them in a file

The present invention then uses the above-specified mapper service M10(see, FIG. 1) to perform the mapping of the EDI document 1 into theplurality of intermediate documents 10, 11. The input and outputsignature of the mapper service M10 preferably follows certain rules:

-   -   The input of the mapper service is preferably the same as the        FLOW service being “parallelized” (“ediToValues” in the        example). In this case, the mapper service accepts an input        “ediFileName” which matches with the input of the service        “ediToValues”.    -   The output of the mapper service is preferably wrapped in an        Integration Server document with name “serviceInputData”. The        content of “serviceInputData” is preferably the input of the        “parallelized” FLOW service. In the example, the output        “edidata” of the mapper service matches with the input of the        service “ediToValues”.    -   Furthermore, the output of the mapper service preferably        provides a boolean “isLastSplit”. The mapper service sets this        value to “true” when it processes the last mapping step. The        mapper service may then be repeatedly called until this value is        set to “true”.

The input and output signature of the reducer service R10, R11, R20preferably also follows certain rules:

-   -   The input of the reducer service is wrapped in an Integration        Server document list called “reduceInputData”. The document list        is preferably an array of documents. The content of each entry        in the document list may be the output of the FLOW service to be        “parallelized”. In the example, an input “isValid” of the        reducer service matches with the output of the service        “ediToValues”.    -   The input of the reducer service may further provide a boolean        “isLastReduceStep”. This value is set to true if the reducer        processes the last reduce call. This can be used to perform        cleanup activities in the reducer service.    -   The output of the reduce service should be the output of the        service that is to be “parallelized”. In the example, the output        “isValid” matches with the output of the service “ediToValues”.

As can be seen, the input and output signatures of the mapper andreducer services defined above conform to the input and output signatureof the FLOW service. This has the advantage that any existing FLOWservice may be easily “parallelized”, since neither the signature northe internal processing of the FLOW service itself have to be adapted.Instead, the mapper and reducer services are simply “plugged in” beforeand after the FLOW service, respectively.

The above presented approach may be especially advantageously applied,if the reduce operations are associative and commutative. For example,when calculating the amount of prime numbers in a range of 1 to 100, twoinput splits may be employed; the first input split being 1 to 50 andthe second input split being 51 to 100. The intermediate outputs in thisexample would be “x” and “y” representing the number of prime numbers inboth splits, respectively. The reduce operation would do the addition,which is associative and commutative.

The above-presented signature conformance is one of the advantages ofthe present invention over the conventional MapReduce algorithm knownfrom the prior art. While the conventional MapReduce map step written bythe user takes an input pair and produces a set of intermediatekey/value pairs, the mapper service on the Integration Server accordingto the present invention follows a standard signature and only “chunks”,i.e. splits, the input data. Furthermore the conventional MapReduce mapstep is typically run on slaves that take an input pair and produce aset of intermediate key/value pairs, wherein the mapper service on theIntegration Server preferably executes on the master server M1, whichthen delegates the chunked input data to the slave servers S1, S2 forexecuting the actual services. This is especially flexible and resultsin an easy development and maintenance of flow services for a number ofreasons: in the conventional MapReduce algorithm, there is no servicewhich is “parallelized”, but it is rather a programming construct whichis defined through mappers and reducers, which perform the desiredoperation. Unlike in the Integration Server, there is no service whichcorresponds to the desired operation. This makes the claimed methodunderstandable and especially user-friendly.

Concerning the conventional MapReduce reduce step written by the user,it takes an intermediate key and a set of values for the key and mergesthe values to form a possibly smaller set of values. On the contrary,when the reducer service is executed on the Integration Server accordingto the present invention, the master server M1 preferably issues areduce call to all the slave servers S1, S2 to collate the relatedintermediate results. When the master server M1 gets back the resultsfrom the slave servers S1, S2 after the reduce operation in each of theslave servers S1, S2, it internally combines the results into one finalresult on the master server M1. This essentially makes the reduceoperation a two-step process performed first on the slave servers S1, S2and then on the master server M1, which saves network bandwidth and thusleads to a further decreased processing time and better utilization ofresources, as already explained above.

Further features of the server cluster of the present invention arepossible. The master Integration Server may for example maintain aconfiguration file which comprises a list of available slave servers. Itmay comprise the required information needed by the master to delegatethe processing to slave servers. This simple facility can be easilyextended to achieve a dynamic identification of the slave nodes. Forexample, when a slave server starts up, it may broadcast itsidentification to all the machines in the server cluster and the masterserver can identify the slave server as a potential slave.

In the following, an exemplary Java implementation of an embodiment ofthe present invention is presented, the main components of which aredepicted in FIG. 7. However, it should be appreciated that the presentinvention is neither restricted to the programming language Java nor tothe concrete implementation shown in the following.

The class JobClient shown in FIG. 7 serves for defining a “job”, whichrepresents one execution of a processing of data according to thepresent invention. An exemplary implementation of JobClient is shown inthe following code listing:

package com.wm.app.b2b.server.mapred; import com.wm.data.IData; importcom.wm.lang.ns.NSName; import com.wm.util.UUID; public class JobClient {  private JobConfiguration jobConf = null;   private JobInProgressjobInProgress = null;   protected NSName mapper = null;   protectedNSName reducer = null;   protected NSName mapTaskName = null;  protected int throttle = 0;   protected boolean isPersistMapIntResult= false;   protected boolean isStoreMapIntResult = false;   protectedString jobId = null;   protected ClusterHosts clusterHosts = null;  protected HostSelector hostSelector = null;   public JobClient(NSNamemapper, NSName reducer, NSName mapTaskName, int throttle, booleanisPersistMapIntResult)   {     this.mapper = mapper;     this.reducer =reducer;     this.mapTaskName = mapTaskName;     this.throttle =throttle;     this.isPersistMapIntResult = isPersistMapIntResult;    jobConf = new JobConfiguration( );     clusterHosts = newClusterHosts( );     hostSelector = new RoundRobinHostSelector( );    hostSelector.setClusterHosts(clusterHosts);    jobConf.readJobConfiguration(clusterHosts);     jobId =UUID.generate( );//better than a integer since it will be unique acrossserver re-starts   }   public JobClient(NSName mapper, NSName reducer,NSName mapTaskName, int throttle, boolean isPersistMapIntResult, booleanisStoreMapIntResult)   {     this(mapper, reducer, mapTaskName,throttle, isPersistMapIntResult);     this.isStoreMapIntResult =isStoreMapIntResult;   }   public void submitJob(IData pipeline)   {    jobInProgress = new JobInProgress(this, pipeline);    jobInProgress.executeAndTrackJob( );   } }

As can be seen, when a new instance of JobClient is created by invokingits constructor (cf. p. 21, line 1), it takes as input the parametersmapper (the mapper implementation to be used for the current job),reducer (the reducer implementation to be used), throttle (the number ofdesired parallel service executions) and isPersistMapIntResult (whetherthe intermediate results should be stored in the memory of the slaves orin a persistent file system). When invoking the submit-Job( )-method(cf. p. 21, line 30), this method takes a pipeline parameter of typeIData, which preferably comprises the input data to be processed, e.g.the data of the EDI document. submitJob( ) then creates a newJobInProgress instance and invokes its executeAndTrackJob( )-method.

An exemplary implementation of JobInProgress is shown in the followingcode listing:

package com.wm.app.b2b.server.mapred; import java.util.ArrayList; importcom.wm.app.b2b.server.InvokeState; import com.wm.app.b2b.server.Service;import com.wm.app.b2b.server.Session; importcom.wm.app.b2b.server.ThreadManager; import com.wm.data.IData; importcom.wm.data.IDataCursor; import com.wm.data.IDataUtil; public classJobInProgress implements Runnable {   private TaskInProgresstaskInProgress = null;   private int mapTaskID = 0;   private intreduceTaskID = 0;   protected JobClient jobClient = null;   protectedTaskTracker taskTracker = new TaskTracker( );   private InvokeStateinvokeState = null;   private Session session = null;   private booleanisJobDone = false;   IData mapperPipeline = null;   publicJobInProgress(JobClient jobClient, IData mapperPipeline)   {    this.jobClient = jobClient;     this.mapperPipeline =mapperPipeline;   }   public void executeAndTrackJob( )   {    if(InvokeState.getCurrentState( ) != null) {       invokeState =(InvokeState)InvokeState.getCurrentState( ).clone( );     }    if(InvokeState.getCurrentSession( ) != null) {       session =(Session)InvokeState.getCurrentSession( ).clone( );     }     //longstartTime = System.currentTimeMillis( );    ThreadManager.runTarget(this);     synchronized (this) {      while(isJobDone == false) {         try {           this.wait( );        }catch(InterruptedException e) {         }       }     }    //long endTime = System.currentTimeMillis( );    //System.out.println(“The time taken in mill secs: ” + (endTime −startTime));   }   public void run( )   {     booleanisAllMapTasksDispatched = false;     ArrayList<Integer>mapTaskIdsCompleteList = null;     Integer[ ] mapTaskIdsComplete = null;    IData reducerPipeline = mapperPipeline;     IDataCursorreducerPipelineCur = null;     InvokeState.setCurrentState(invokeState);    InvokeState.setCurrentSession(session);     taskInProgress = newTaskInProgress( );     int numMapTasksComplete = 0;     while(true)    {       //check if all the map tasks are dispatched      if(!isAllMapTasksDispatched) {         // Splitter        mapTaskID ++;         IDataCursor mapperPipelineCursor =mapperPipeline.getCursor( );         IDataUtil.put(mapperPipelineCursor,MapReduceConstants.MAP_ITERATION, mapTaskID);         try {          Service.doInvoke(jobClient.mapper, mapperPipeline);        }catch(Exception e) {           e.printStackTrace( );          break;         }         // Endpoint Service         IDataserviceInputData = (IData)IDataUtil.get(mapperPipelineCursor,MapReduceConstants.SERVICE_INPUT_DATA);        IDataUtil.remove(mapperPipelineCursor,MapReduceConstants.SERVICE_INPUT_DATA);         isAllMapTasksDispatched= IDataUtil.get(mapperPipelineCursor, MapReduceConstants.IS_LAST_SPLIT)== null ? false : true;         if(isAllMapTasksDispatched) {          IDataUtil.remove(mapperPipelineCursor,MapReduceConstants.IS_LAST_SPLIT);          IDataUtil.remove(mapperPipelineCursor,MapReduceConstants.MAP_ITERATION);         }        mapperPipelineCursor.destroy( );        //System.out.println(“spawning map task:” + mapTaskID);        MapTask mapTask = TaskFactory.createMapTask(this, mapTaskID);        mapTask.setTaskInput(serviceInputData);        mapTask.setHostSelector(jobclient.hostSelector);        //throttle the max number of mapTasks executed in parallel        //It takes in to account the previous map tasks that aresubmitted and completed         synchronized (taskTracker) {          try {             numMapTasksComplete =taskTracker.getNumCompletedMapTasks( );             while( (mapTaskID −numMapTasksComplete) > jobClient.throttle) {  //System.out.println(“waiting for some map tasks to complete, nummapTasks onGoing:” + (mapTaskID − numMapTasksComplete));              taskTracker.wait( );               numMapTasksComplete =taskTracker.getNumCompletedMapTasks( );             }          }catch(InterruptedException e) {             //TODO          }         }         taskInProgress.executeTask(mapTask,invokeState, session);       }else if(jobClient.reducer != null &&isAllMapTasksDispatched) {         // Reducer         synchronized(taskTracker) {           mapTaskIdsCompleteList =taskTracker.getCompletedMapTasks( );          while(mapTaskIdsCompleteList == null ||mapTaskIdsCompleteList.size( ) == 0) {             try {              taskTracker.wait( );            }catch(InterruptedException e) {             }            mapTaskIdsCompleteList = taskTracker.getCompletedMapTasks();           }         }         if(mapTaskIdsCompleteList != null &&mapTaskIdsCompleteList.size( ) > 0) {           mapTaskIdsComplete =mapTaskIdsCompleteList.toArray(new Integer[0]);           reduceTaskID+= mapTaskIdsComplete.length;           //System.out.println(“processingreduce.tasks :” + mapTaskIdsComplete.length);           if(mapTaskID ==reduceTaskID) {             reducerPipelineCur =reducerPipeline.getCursor( );            IDataUtilput(reducerPipelineCur,MapReduceConstants.IS_LAST_REDUCE_STEP, true);            reducerPipelineCur.destroy( );           }          ReduceTask reduceTask = TaskFactory.createReduceTask(this,reduceTaskID);           reduceTask.setTaskInput(reducerPipeline);          reduceTask.setCompletedMapTasks (mapTaskIdsComplete);          reduceTask.runTask( );           //If due to some reason, oneof the reduce task returned null,           //this effect should notnullify all other reduce tasks output           //that were collectedbefore           //TODO when the whole framework is done, the failedreduce task (that returned null) should           //be re-executed          IData tempReduceOutput = reduceTask.getTaskResult( );          if(tempReduceOutput != null) {             reducerPipeline =tempReduceOutput;           }         }         //All map and reducetasks are complete         if(isAllMapTasksDispatched && mapTaskID ==reduceTaskID) {           IDataCursor reduceServicePipelineCur =reducerPipeline.getCursor( );           IData reduceIntermediateOutput =(IData)IDataUtil.get(reduceServicePipelineCur,MapReduceConstants.SERVICE_OUTPUT_DATA);          IDataUtil.remove(reduceServicePipelineCur,MapReduceConstants.SERVICE_OUTPUT_DATA);          IDataUtil.remove(reduceServicePipelineCur,MapReduceConstants.IS_LAST_REDUCE_STEP);          reduceServicePipelineCur.destroy( );           // Merge thefinal result into the pipeline data          IDataUtil.merge(reduceIntermediateOutput, mapperPipeline);          break;         }       }     }     final String tabSpace=“    ”;     //print the status of the map tasks and reduce tasks    System.out.println(“Num Map Tasks : ” + mapTaskID + tabSpace + “NumReduce Tasks : ” + reduceTaskID);     //clean up once the job iscompleted     InvokeState.setCurrentState(null);    InvokeState.setCurrentSession(null);    jobClient.clusterHosts.freeClusterHosts( );     synchronized (this){       isJobDone = true;       this.notifyAll( );     }   } }

As can be seen, JobInProgress's run( )-method in this example comprisesthe main code for processing the input file, i.e. the steps of splitting(cf. p. 24, line 29), executing the “parallelized” flow services (cf. p.25, line 8) and reducing (cf. p. 26, line 24).

An exemplary implementation of MapTask, which performs the mapping, isshown in the following code listing:

package com.wm.app.b2b.server.mapred; importcom.wm.app.b2b.server.InvokeState; importcom.wm.app.b2b.server.ThreadManager; import com.wm.data.IData; importcom.wm.lang.ns.NSName; public class MapTask extends AbstractTaskimplements Runnable {   public MapTask(int id, TaskTracker taskTracker,NSName mapSvcName, TaskResultStoreConfig policy)   {     super(id,taskTracker, mapSvcName, policy);   }   public void runTask( )   {    ThreadManager.runTarget(this);   }   public void run( )   {    InvokeState.setCurrentState(invokeState);    InvokeState.setCurrentSession(session);     taskStatus = newTaskStatus( );     IData output = null;     try {       output =RPC.remoteInvoke(hostSelector.getHostInfoEntry( ), serviceName,taskInput);       taskInput = null;     }catch(Exception e) {      output = null;     }     storeTaskResult(output);     synchronized(taskTracker) {       taskTracker.addMapTaskToCompletedList(this);      setTaskStatus(TaskStatus.TASK_COMPLETE);      taskTracker.notifyAll( );     }    InvokeState.setCurrentState(null);    InvokeState.setCurrentSession(null);     //System.out.println(“maptask time is :” + toalSvcTime);   } }

As can be seen, when a MapTask is executed, i.e. when its run( )-methodis invoked, MapTask invokes the remoteInvoke( )-method (p. 30, line 21)of the RPC class, which takes three input parameters:hostSelector.getHostInfoEntry( ), serviceName and taskInput. taskInputis an attribute inherited from the superclass AbstractTask andpreferably comprises the input to be processed, e.g. the data of the EDIdocument.

An exemplary implementation of RPC and its remoteInvoke( )-method isshown in the following code listing:

package com.wm.app.b2b.server.mapred; importcom.wm.app.b2b.client.Context; importcom.wm.app.b2b.client.ServiceException; importcom.wm.app.b2b.server.ACLManager; importcom.wm.app.b2b.server.BaseService; import com.wm.app.b2b.server.Service;import com.wm.app.b2b.server.invoke.InvokeManager; importcom.wm.app.b2b.server.ns.Namespace; import com.wm.lang.ns.NSName; importcom.wm.data.IData; import com.wm.data.IDataCursor; importcom.wm.data.IDataUtil; public class RPC {   private static final StringDEFAULT_RPC_SERVER = “localhost”;   private static final intDEFAULT_RPC_PORT = 5555;   private static final String DEFAULT_RPC_UID =“Administrator”;   private static final String DEFAULT_RPC_PASSWD =“manage”;   private static String rpcServer = null;   private static intrpcPort = −1;   private static String rpcUID = null;   private staticString rpcPasswd = null;   private static Object lock = new Object( );  public static IData remoteInvoke(HostInfo hostInfoEntry, NSNamemapTaskName, IData input) throws Exception   {   if(hostInfoEntry ==null) {     return null;   }   boolean isPrimary =hostInfoEntry.isPrimary( );   if(isPrimary) {     returnService.doInvoke(mapTaskName, input);   }else {     Context context =null;     //synchronize here, otherwise more than one context     //maybe created for the host     synchronized (lock) {      if(!hostInfoEntry.isConnected) {         String hostName =hostInfoEntry.getHostName( );         int port = hostInfoEntry.getPort();         context = new Context( );        context.connect(hostName+“:”+port,DEFAULT_RPC_UID,DEFAULT_RPC_PASSWD) ;        hostInfoEntry.setContext(context);        hostInfoEntry.setConnected(true);       }else {         context= hostInfoEntry.getContext( );       }     }     if(context != null) {      return context.invoke(mapTaskName, input);     }   }     returnnull;   } }

An exemplary implementation of ReduceTask is shown in the following codelisting:

package com.wm.app.b2b.server.mapred; importcom.wm.app.b2b.server.Service; import com.wm.data.IData; importcom.wm.data.IDataCursor; import com.wm.data.IDataUtil; importcom.win.lang.ns.NSName; public class ReduceTask extends AbstractTask {  private Integer[ ] mapTaskIDArray = null;   private boolean inMemory =false;   public ReduceTask(int id, TaskTracker taskTracker, NSNamereduceSvcName, TaskResultStoreConfig resultStoragePolicy)   {    super(id, taskTracker, reduceSvcName, resultStoragePolicy);    if(resultStoragePolicy instanceof TaskResultMemStoreConfig) {      inMemory = true;     }   }   public voidsetCompletedMapTasks(Integer[ ] mapTaskIDArray) {    this.mapTaskIDArray = mapTaskIDArray;   }   public void runTask( )  {     if(mapTaskIDArray == null || mapTaskIDArray.length == 0) {      return;     }     if(inMemory) {      invokeReduceService(mapTaskIDArray);     }     else {      IDataCursor reduceServicePipelineCur = taskInput.getCursor( );      boolean isLastReduceBatch =IDataUtil.getBoolean(reduceServicePipelineCur,MapReduceConstants.IS_LAST_REDUCE_STEP);      IDataUtil.remove(reduceServicePipelineCur,MapReduceConstants.IS_LAST_REDUCE_STEP);      reduceServicePipelineCur.destroy( );       for(int i=0;i<mapTaskIDArray.length; i++) {         if(i == mapTaskIDArray.length−1&& isLastReduceBatch) {           reduceServicePipelineCur =taskInput.getCursor( );          IDataUtil.put(reduceServicePipelineCur,MapReduceConstants.IS_LAST_REDUCE_STEP, true);          reduceServicePipelineCur.destroy( );         }        invokeReduceService(new Integer[ ] {mapTaskIDArray[i]});       }    }     storeTaskResult(taskInput);   }   private voidinvokeReduceService(Integer[ ] maPTaskIDs) {     IData[ ]reduceInputDataArray = null;     IDataCursor reduceServicePipelineCur =taskInput.getCursor( );     IData reduceIntermediateOutput =(IData)IDataUtil.get(reduceServicePipelineCur,MapReduceConstants.SERVICE_OUTPUT_DATA);     int length = 0;    if(reduceIntermediateOutput != null) {       length =mapTaskIDs.length + 1;     }else {       length = mapTaskIDs.length;    }     int count = 0;     reduceInputDataArray = new IData[length];    if(reduceIntermediateOutput != null) {      reduceInputDataArray[count++] = reduceIntermediateOutput;     }    for(Integer mapTaskID : mapTaskIDs) {      synchronized(taskTracker) {         reduceInputDataArray[count ++]= taskTracker.removeMapTask(mapTaskID).getTaskResult( );       }     }    IDataUtil.put(reduceServicePipelineCur,MapReduceConstants.REDUCE_INPUT_DATA, reduceInputDataArray);    if(reduceInputDataArray != null) {       try {        Service.doInvoke(serviceName, taskInput);       }catch(Exceptione) {         e.printStackTrace( );       }     }    reduceInputDataArray = null;    IDataUtil.remove(reduceServicePipelineCur,MapReduceConstants.REDUCE_INPUT_DATA);    reduceServicePipelineCur.destroy( );   } }

Both MapTask and ReduceTask have the abstract class AbstractTask assuperclass, i.e. they inherit its attributes and set- and -get-methods,which are shown in the following exemplary code listing of AbstractTask:

package com.wm.app.b2b.server.mapred; importcom.wm.app.b2b.server.InvokeState; import com.wm.app.b2b.server.Session;import com.wm.data.IData; import com.wm.lang.ns.NSName; public abstractclass AbstractTask implements Task {   private int taskID = −1;  protected TaskStatus taskStatus = null;   protected TaskTrackertaskTracker = null;   protected InvokeState invokeState = null;  protected Session session = null;   protected IData taskInput = null;  protected NSName serviceName = null;   protected HostSelectorhostSelector = null;   private TaskResultStoreConfig taskResultStoreCfg= null;   public AbstractTask(int id, TaskTracker taskTracker, NSNameserviceName, TaskResultStoreConfig taskResultStoreCfg) {     this.taskID= id;     this.taskTracker = taskTracker;     this.serviceName =serviceName;     this.taskResultStoreCfg = taskResultStoreCfg;   }  public void setInvokeState(InvokeState invokeState)   {    this.invokeState = invokeState;   }   public voidsetInvokeSession(Session session)   {     this.session = session;   }  public void setTaskID(int taskID)   {     this.taskID = taskID;   }  public int getTaskID( )   {     return taskID;   }   public voidsetTaskInput(IData taskInput) {     this.taskInput = taskInput;   }  public void setTaskStatus(int status)   {    taskStatus.setTaskStatus(status);   }   public voidstoreTaskResult(IData result) {    this.taskResultStoreCfg.storeResult(result);   }   public IDatagetTaskResult( ) {     returnthis.taskResultStoreCfg.fetchResultAndDestroy( );   }   public voidsetHostSelector(HostSelector hostSelector)   {     this.hostSelector =hostSelector;   } }

As can be seen, AbstractTask itself implements the interface Task, anexemplary implementation of which is shown in the following codelisting:

package com.wm.app.b2b.server.mapred; importcom.wm.app.b2b.server.InvokeState; import com.wm.app.b2b.server.Session;import com.wm.data.IData; public interface Task {   public voidsetInvokeState(InvokeState invokeState);   public voidsetInvokeSession(Session session);   public voidsetHostSelector(HostSelector hostSelector);   public int getTaskID( );  public void setTaskID(int taskID);   public void setTaskStatus(intstatus);   public void setTaskInput(IData taskInput);   public voidrunTask( );   public void storeTaskResult(IData result);   public IDatagetTaskResult( ); }

A number of further infrastructure classes and interfaces are requiredin the exemplary implementation of the present invention, which areshown in the following code listings:

1. A method for MapReducing the processing of an Electronic DataInterchange (EDI) document (1), the method comprising the followingsteps: a. mapping the EDI document (1) into a plurality of intermediatedocuments (10, 11); b. processing the intermediate, documents (10, 11)to produce a plurality of intermediate results (20-23); c. reducing theplurality of intermediate results (20-23) to produce a plurality ofreduced intermediate results (30, 31); and d. reducing the reducedintermediate results (30, 31) to produce a final result (2) representingthe processed EDI document (1).
 2. The method of claim 1, wherein theEDI document (1) is mapped such that each of the intermediate documents(10, 11) comprises at least one of a plurality of interchange envelopes,at least one of a plurality of functional group envelopes and/or atleast one of a plurality of transaction set envelopes of the EDIdocument (1).
 3. The method of claim 1, wherein steps a. and d. areperformed by a master server (M1) of a server cluster and wherein stepsb. and c. are performed by a plurality of slave servers (S1, S2) of theserver cluster, each slave server (S1, S2) processing one or moreintermediate documents (10, 11) and reducing one or more intermediateresults (20-23).
 4. The method of claim 3, further comprising the stepof sending the intermediate documents (10, 11) to the slave servers (S1,S2) from the master server (M1) by an asynchronous invocation.
 5. Themethod of claim 3, wherein the EDI document (1) is stored in adistributed file system accessible to the slave servers (S1, S2) andwherein the method further comprises the step of sending a reference tothe intermediate documents (10, 11) to the slave servers (S1, S2) fromthe master server
 6. The method of claim 1, wherein each of theintermediate results (20-23) comprises an identifier relating therespective intermediate result (20-23) to the EDI document (1).
 7. Themethod of claim 1, wherein a processing logic for performing theprocessing of the slave servers (S1, S2) in step b. is distributed tothe slave servers (S1, S2) during runtime.
 8. A server clustercomprising a master server (M1) and a plurality of slave servers (S1,S2) adapted for performing a method of claim
 1. 9. A method forMapReducing the processing of at least one input (1) of a FLOW service,the method comprising the steps of: a. mapping the at least one input(1) of the FLOW service into a plurality of intermediate inputs (10, 11)by a mapper service (Mb); b. executing a plurality of instances (F10,F10′) of the FLOW service, the instances (F10, F10′) of the FLOW serviceprocessing the intermediate inputs (10, 11) to produce a plurality ofintermediate results (20-23); c. reducing the intermediate results(20-23) into a plurality of reduced intermediate results (30, 31) by aplurality of first reducer services (R10, R11); and d. reducing thereduced intermediate results (30, 31) to produce a final output (2) ofthe FLOW service from the reduced intermediate results (30, 31) by asecond reducer service (R20).
 10. The method of claim 9, wherein themapper service (M10) and the second reducer service (R20) are executedon a master server (M1) of a server cluster and wherein the plurality ofinstances (F10, F10′) of the FLOW service and the plurality of firstreducer services (R10, R11) are executed on a plurality of slave servers(S1, S2) of the server cluster.
 11. The method of claim 9, wherein aninput signature of the mapper service (M10) conforms to an inputsignature of the FLOW service.
 12. The method of claim 9, wherein anoutput signature of the reducer services (R10, R11, R20) conforms to anoutput signature of the FLOW service.
 13. The method of claim 9, whereinat least one input of the FLOW service comprises an Electronic DataInterchange (EDI) document (1).
 14. A server cluster comprising a masterserver (M1) and a plurality of slave servers (S1, S2) adapted forperforming a method of claim
 9. 15. A computer program comprisinginstructions adapted for implementing a method of claim 1.